Using multiple linguistic features for Mandarin phrase break prediction in maximum-entropy classification framework

نویسندگان

  • Yu Zheng
  • Gary Geunbae Lee
  • Byeongchang Kim
چکیده

We model Mandarin phrase break prediction as a classification problem with three level prosodic structures and apply conditional maximum entropy classification to this problem. We acquire multiple levels of linguistic knowledge from an annotated corpus to become well-integrated features for maximum entropy framework. Five kinds of features were used to represent various linguistic constraints including POS tag features, lexical features, phonetic features, length features, and distance features. Experiment results show that our method performs better than the previous methods and the conditional maximum entropy (ME) model is very effective for data sparseness problem in Mandarin phrase break prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating second-order information into two-step major phrase break prediction for Korean

In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and...

متن کامل

TODO: This is a placeholder. Final title will be filled later

In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and...

متن کامل

Phrase break prediction using logistic generalized linear model

In this paper we propose a novel phrase break prediction model for Mandarin speech synthesis. It is generalized linear models (GLM) with stepwise regression solution. We assume phrase break obeys Bernoulli distribution and then model phrase break probability by Logistic GLM. The attribute set is automatically selected by stepwise regression, which is a totally data-driven method. We also introd...

متن کامل

Chinese prosody phrase break prediction based on maximum entropy model

A maximum entropy based model for prosody phrase break prediction was proposed in this paper, and a comparison was conducted on large corpora between the new model and the decision tree based model which was the mainstream method for prosody phrase break prediction. The contribution of lexical information and influences of different cutoff values were also investigated. It was demonstrated that...

متن کامل

Phrase Break Prediction Using a Finite State Transducer

This paper presents a method for phrase break prediction using a finite state transducer. In the literature, several algorithms have been proposed using statistical techniques for predicting phrase breaks. Some of these methods rely on linguistic information, such as syllables, words, part-of-speech, accents, etc. Our proposal is a probabilistic finite state transducer to convert part-ofspeech ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004